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Abstract 

We consider a backhaul-constrained coordinated cellular network. That is, a single-frequency network 
with N+l multi-antenna base stations (BSs) that cooperate in order to decode the users' data, and that are 
linked by means of a common lossless backhaul, of limited capacity R. To implement receive cooperation, 
we propose distributed compression: N BSs, upon receiving their signals, compress them using a multi- 
source lossy compression code. Then, they send the compressed vectors to a central BS, which performs 
users' decoding. Distributed Wyner-Ziv coding is proposed to be used, and is optimally designed in this 
work. The first part of the paper is devoted to a network with a unique multi-antenna user, that transmits 

O ■ 

a predefined Gaussian space-time codeword. For such a scenario, the compression codebooks at the BSs 
are optimized, considering the user's achievable rate as the performance metric. In particular, for N = 1 

00 : 

the optimum codebook distribution is derived in closed form, while for N > 1 an iterative algorithm is 
devised. The second part of the contribution focusses on the multi-user scenario. For it, the achievable 

■ rate region is obtained by means of the optimum compression codebooks for sum-rate and weighted 

U ' 

sum-rate, respectively. 
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I. Introduction 

Inter-cell interference is one of the most limiting factors of current cellular networks. It can be partially, 
but not totally, mitigated resorting to frequency-division multiplexing, sectorized antennas and fractional 
frequency reuse [1]. However, a more spectrally efficient solution has been recently proposed: coordinated 
cellular networks [2]. They consist of single-frequency networks with base stations (BSs) cooperating in 
order to transmit to and receive from the mobile terminals. Beamforming mechanisms are thus deployed 
in the downlink, as well as coherent detection in the uplink, to drastically augment the system capacity 
[3], [4]. Hereafter, we only focus on the uplink channel. 

Preliminary studies on the uplink performance of coordinated networks consider all BSs connected 
via a lossless backhaul with unlimited capacity [5] [6]. Accordingly, the capacity region of the network 
equals that of a MIMO multi-access channel, with a supra-receiver containing all the antennas of all 
cooperative BSs [7]. Such an assumption seems optimistic in short-mid term, as operators are currently 
worried about the costs of upgrading their backhaul to support e.g., HSPA traffic load. To deal with 
a realistic backhaul constraint, two approaches have been proposed: i) distributed decoding [8], [9], 
consisting on a demodulating scheme distributely carried out among BSs, based on local decisions and 
belief propagation. Decoding delay appears to be its main problem, ii) Quantization [10], where BSs 
quantize their observations and forward them to decoding unit. Its main limitation relies on its inability 
to take profit of signal correlation between antennas/BSs; thus, introduces redundancy into the backhaul. 

This paper considers a new approach for the network: distributed compression. The cooperative BSs, 
upon receiving their signals, distributely compress them using a multi-source lossy compression code [1 1]. 
Then, via the lossless backhaul, they transmit the compressed signals to the central unit (also a BS); which 
decompresses them using its own received signal as side information, and finally uses them to estimate 
the users' messages. Distributed compression has been already proposed for coordinated networks in 
[12]— [14]. However, in those works, authors consider single-antenna BSs with ergodic fading. We extend 
the analysis here to the multiple-antenna case with time-invariant fading. 

The compression of signals with side information at the decoder is introduced by Wyner and Ziv in 
[15], [16]. They show that side information at the encoder is useless {i.e., the rate-distortion tradeoff 
remains unchanged) to compress a single, Gaussian, source when it is available at the decoder [16, 
Section 3]. Unfortunately, when considering multiple (correlated) signals, independently compressed at 
different BSs, and to be recovered at a central unit with side information, such a statement can not be 
claimed. Indeed, this is an open problem, for which it is not even clear when source-channel separation 
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applies [17]. To the best of authors knowledge, the scheme that performs best (in a rate-distortion sense) 
for this problem is Distributed Wyner-Ziv (D-WZ) compression [18]. Such a compression is the direct 
extension of Berger-Tung coding to the decoding side information case [19], [20]. In turn, Berger-Tung 
compression can be thought as the lossy counterpart of the Slepian-Wolf lossless coding [21]. D-WZ 
coding is thus the compresssion scheme proposed to be used, and is detailed in the sequel. 

Summary of Contributions. This paper considers a single-frequency network with N + 1 multi- 
antenna BSs. The first base station, denoted BSo, is the central unit and centralizes the users' decoding. 
The rest, BSi, • • • , BStv, are cooperative BSs, which distributely compress their received signals using 
a D-WZ code, and independently transmit them to BSo via the common backhaul of aggregate capacity 
R. In the network, time-invariant, frequency-flat channels are assumed, as well as transmit and receive 
channel state information (CSI) at the users and BSs, respectively. 

The first part of the paper is devoted to a network with a single user, equipped with multiple antennas. 
It aims at deriving the optimum compression codebooks at the BSs, for which the user's transmission 
rate is maximized. Our contributions are the following: 

• First, Sec. II revisits Wyner-Ziv coding [16, Section 3] and Distributed Wyner-Ziv coding [19], and 
adapts them to our compression scenario. 

• For the single user transmitting a given Gaussian codeword, Sec. Ill proves that the optimum 
compression codebooks at the BSs are Gaussian distributed. Accordingly, the compression step 
is modelled by means of Gaussian "compression" noise, added by the BSs on their observations 
before retransmitting them to the central unit. 

• Considering a unique cooperative BS in the network (i.e., N = 1), Sec. IV derives in closed form the 
optimum "compression" noise for which the user's rate is maximized. We also show that conditional 
Karhunen-Loeve transform plus independent Wyner-Ziv coding of scalar streams is optimal. 

• The compression design is extended in Sec. V to arbitrary N BSs. The optimum "compression" 
noises (i.e., the optimum codebook distributions) are obtained by means of an iterative algorithm, 
constructed using dual decomposition theory and a non-linear block coordinate approach [22], [23]. 
Due to the non-convexity of the noises optimization, only local convergence is proven. 

The second part of the paper extends the analysis to a network where multiple users transmit simulta- 
neously. For it, the achievable rate region is described resorting to the weighted sum-rate optimization: 

• First, the sum-rate of the network is derived in Sec. VI, adapting previous results a single-user. Later, 
the weighted sum-rate, and its associated optimum compression "noises", are obtained by means of 
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an iterative algorithm, constructed using dual decomposition and Gradient Projection [23]. 
Notation. E {•} denotes expectation. A T , and a* stand for the transpose of A, conjugate transpose 
of A and complex conjugate of a, respectively. [a] + = max {a, 0}. /(•;•) denotes mutual informa- 
tion, H (•) entropy. The derivative of a scalar function / (•) with respect to a complex matrix X 
is defined as in [24], i.e., = Q £l . In such a way, e.g., ^qx^ = ^ "• Moreover, we 

compactly write Yi : n = {Y\,--- , Yjv}> Yg = {Yi\i £ Q} and Y£ = {Yi\i^n}. A sequence of 
vectors {^*}™ =1 is compactly denoted by Y™. Furthermore, to define block-diagonal matrices, we state 
diag(Ai,-- - ,A n ), with Ai square matrices, coh(-) stands for convex hull. Finally, the covariance 
of random vector X conditioned on random vector Y is denoted by Rx\y ar *d computed Rx\y = 
E |(X - E{X\Y}) (X - E{X\Y}) f \y} . 

II. Compression of Vector Sources 

The aim of compression within coordinated networks is to make the decoder extract the more mutual 
information from the reconstructed signals. Known rate-distortion results apply to this goal as follows. 

A. Single-Source Compression with Decoder Side Information 

Consider Fig. 1 with N = 1. Let Y™ be a zero-mean, temporally memoryless, Gaussian vector to be 
compressed at BSi. Assume that it is the observation of the signal transmitted by user s, i.e., X". BSi 
compresses the signal and sends it to BSo, which makes use of its side information Yq 1 to decompress 
it. Finally, once reconstructed the signal into vector Y™, the decoder uses it to estimate the message 
transmitted by the user. Wyner's results [16] apply to this problem as follows. 

Definition 1 (Single-source Compression Code): A (n,2 np ) compression code with side information 
at the decoder Yq is defined by two mappings, /„(•) and <?„(•) and three spaces y±,yi and y^, where 

/„:3?-{l,-" ,r p } 

5„:{1,-" ,2 np }xyS^$?. 
Proposition 1 (Wyner-Ziv Coding [16]): Let the random vector Y[ with conditional probability p (Yi\Yij 
satisfy the Markov chain Yq — > Y\ — > Y\, and let Yq and Y\ be jointly Gaussian. Then, considering a 
sequence of compression codes (n, 2 np ) with side information Yq at the decoder: 

U (X n s - Y n , 9n (Y n , f n (Y?))) = I (X s ; Yq, Y,) (1) 

as n — > oo if: 
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• the compression rate p satisfies 

j(*i;*i|ro) <p, (2) 

• the compression codebook £ consists of 2 np random sequences Yj™ drawn i.i.d. from n™=o^ (^i)' 
where p (y) = £y; p (Yi) p (y x | Y x ) , 

• the encoding / n (■) outputs the bin-index of codewords Y{ 1 that are jointly typical with the source 
sequence Y x n . In turn, g n (■) outputs the codeword Yj™ that, belonging to the bin selected by the 
encoder, is jointly typical with Yq. 

Proof: The proposition is proven in [16, Lemma 5] using joint typicality arguments. ■ 

B. Multiple-Source Compression with Decoder Side Information 

Consider Fig. 1. Let Y™, i = 1, • • • , N be N zero-mean, temporally memoryless, Gaussian vectors 
to be compressed independently at BSi, • • • , BSat, respectively. Assume that they are the observations 
at the BSs of the signal transmitted by user s, i.e., X™. The compressed vectors are sent to BSo, which 
decompresses them using its side information Yq™ and uses them to estimate the user's message. Notice 
that the architecture in Fig. 1 imposes source-channel separation at the compression step, which is not 
shown to be optimal. However, it includes the coding scheme with best known performance: Distributed 
Wyner-Ziv coding [18]. It applies to the setup as follows. 

Definition 2 (Multiple-source Compression Code): A (n, 2 npl , • • • , 2 npN ) compression code with side 
information at the decoder Yq is defined by N + 1 mappings, /£(■), i = 1, ■ • ■ , N, and g n (-), and 2N + 1 
spaces y^, y>i, i = 1, ■ • ■ ,N and y$, where 

f n -.y?^{l,... x pi }, i = l,--- ,N 

g n ■{!,■■■ ,2 npi } x • • • x {1, • • • ,2 npN } xy%^y?x---x jft. 
Proposition 2 (Distributed Wyner-Ziv Coding [18]): Let the random vectors Yj, i = 1, ■ • ■ , N, have 
conditional probability p (Yl\ Y^j and satisfy the Markov chain (Yq, Yf, Yf^j — > Y — > Y- Let Yo and Yj> 
i = 1, • • • , iV be jointly Gaussian. Then, considering a sequence of compression codes (n, 2 npl , • • • , 2 npjv ) 
with side information Yq at the decoder: 

h (X a »; Y ", ffn (YJ\ /i (Y/ 1 ) ,.-.,/,? (Y#))) = 7 (X s ; Y , Y 1:JV ) (3) 

as n — > oo if: 

• the compression rates pi, ■ • ■ , pat satisfy 

7(Y g ;Yg|Y ,Y g c ) <^p, V£? C {1, • ■ ■ , TV} , (4) 

ieg 
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• each compression codebook d, i = 1,- • • , N consists of 2 npi random sequences Y™ drawn i.i.d. 

from nr=i P . where p = Ey: p(^)p ($1^) • 

• for every i = 1, • ■ ■ , iV, the encoding (•) outputs the bin-index of codewords YJ™ that are jointly 
typical with the source sequence Y" '. In turn, g n (•) outputs the codewords Y^ n , i = 1, • • • , JV that, 
belonging to the bins selected by the encoders, are all jointly typical with Y$. 

Proof: The proposition is proven for discrete sources and discrete side information in [18, Theorem 
2]. Also, the extension to the Gaussian case is conjectured therein. The conjecture can be proven by noting 
that D-WZ coding is equivalent to Berger-Tung coding with side information at the decoder [19]. In turn, 
Berger-Tung coding can be implemented through time-sharing of successive Wyner-Ziv compressions 
[20], for which introducing side information Yq at the decoder reduces the compression rate as in (4). 
Due to space limitations, we limit the proof to this sketch. ■ 
Now, we can present the coordinated cellular network with D-WZ coding. 

III. System Model 

Let a single source s, equipped with N t antennas, transmit data to base stations BSo, • • • , BSjv, each 
one equipped with iVj, i = 1, • • • ,N antennas. The BSs, as in typical 3G networks, are connected 
(through radio network controllers) to a common lossless backhaul of aggregate capacity R, and BSo is 
selected to be the decoding unit. This user-to-BSs assignment is assumed to be given by upper layers 
and out of the scope of the paper 1 . 

The source transmits a message lo £ { 1 , • • • , 2 nRs } mapped onto a zero-mean, Gaussian codeword 
X", drawn i.i.d. from random vector X s ~ CM (0, Q) and not subject to optimization. The transmitted 
signal, affected by time-invariant, memory-less fading, is received at the BSs under additive noise: 

Y? = H a>i -X? + Z?, i = 0,--- ,N (5) 

where H s j is the MIMO channel matrix between user s and BSi, and Z t ~ CAA(0,ofl) is AWGN. 
Channel coefficients are known at both the BSs and at the user, while BSo h as centralized knowledge of 
all the channels within the network. 

'The derivation of the optimum set of BSs to decode the user is out of the scope of our study. We refer the reader to e.g, [6] 
for assignment algorithms and selection criteria. 
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A. Problem Statement 

Base stations BSi, • • • , BSjv, upon receiving their signals, distributely compress them using a D-WZ 
compression code. Later, they transmit the compressed vectors to BSo, which recovers them and uses 
them to decode. Considering so, the user's message can be reliably decoded iif [12, Theorem 1]: 

R s < lim -I (Xl\ Y n , g n {Y n , ft (Y?) , ■ ■ • , f? (6) 

n^oo n 

= I (x s ;Y ,Y 1:N ) . 

Second equality follows from (3) in Prop. 2. However, equality only holds for compression rates satisfying 
the set of constraints (4). As mentioned, in the backhaul there is only an aggregate rate constraint R, 
i- e -> J2ieg Pi — R> — {!> ' ' ' > N}. Therefore, the set of constraints (4) can be all re-stated as: 

I (Yg;Yg\Y ,Y g c ) <R V£? C {1, • ■ ■ , TV} . (7) 

Furthermore, from the Markov chain in Prop. 2, the following inequality holds 

I (Yg;Y g \Y ,Yfj < I (Y 1:N ;Y 1:N \Y ^ \/Q C {1, • • • , N} . (8) 

Therefore, forcing the constraint / (Yi : n; Yi : n\Yq^ < R to hold makes all constraints in (7) to hold too. 
Accordingly, the maximum transmission rate C of user s is obtained from optimization: 

C = max I (X S ;Y ,Y 1:N ) (9) 

nf =lP (v^) v J 

s.t. I (y 1:N ;Y 1:N \Y ^ <R, 

Theorem 1: Let X s ~ CAf (0,Q). Optimization (9) is solved for Gaussian conditional distributions 
p (Yl\Yij , i = 1, • ■ ■ , N. Thus, the compressed vectors can be modelled as Y = Yi + Zf, where 
Zf ~ CM (0, <&j) is independent, Gaussian, "compression" noise at BSj. That is, 

C = max log det (i + %H ] S Q H sfi + Q V {a 2 r I + * n ) _1 H a>r \ (10) 
s.t. logdet(/ + diag(*rV-- ,$ N r ) R Yl .. N \ Yo ) <R- 

where the conditional covariance R Yi . n \y follows (54). 

Proof: See Appendix II for the proof. ■ 
Remark 1: The maximization above is not concave in standard form: although the feasible set is 
convex, the objective function is not concave on <&!,••■ , <&n- 
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B. Useful Upper Bounds 

Prior to solving (10), we present two upper bounds on it. 

Upper Bound 1: The achievable rate C in (10) is upper bounded by 

C < I (X s ; Y Q , Y 1:N ) = log det ( I + % £ . (1 1) 

Upper Bound 2: The achievable rate C in (10) satisfies 

C < I (X s ; Y ) + R = log det (j + ^H a , Q QH\^j + R. (12) 
Proof: See Appendix III for the proof. ■ 
Remark 2: Notice that, independently of the number of BSs, the achievable rate is bounded above by 
the capacity with BSo plus the backhaul rate. 

IV. The Two-Base Stations Case 

We first solve (10) for N = 1. As mentioned, the objective function, which has to be maximized, is 
convex on <l>i >z 0. In order to make it concave, we change the variables <l>i = A^ 1 , so that 

C = max log det (j + ^H\ Q H sfl + QH^ (A,a 2 r + /) ^ A 1 H S ^ (13) 

s.t. log det (/ + AiRy^y) < R - 

The objective has turned into concave. However, the constraint now does not define a convex feasible set. 
Therefore, Karush-Kuhn-Tucker (KKT) conditions become necessary 2 but not sufficient for optimality. 
To solve the problem, we need to resort to the general sufficiency condition [23, Proposition 3.3.4]: 
first, we derive a matrix A\ for which the KKT conditions hold. Later, we demonstrate that the selected 
matrix also satisfies the general sufficiency condition, thus becoming the optimal solution. The optimum 
compression noise is finally recovered as $J = (Af This result is presented in Theorem 2: 
Theorem 2: Let X s ~ CJ\f (0, Q) and the conditional covariance (see Appendix I-A): 

R Yl \ Yo = H s l (i + ^H\ fl H s ^ QHl, + oil, (14) 

with eigen-decomposition Ry[\y = Udiag (si, ■ ■ ■ , sjvj U^. The optimum "compression" noise at BSi 
is *i = [/ (diag (771, • • • , Wi)) -1 U\ with 

A V ol Si J o 2 r . 



(15) 



and A is such that J2f=i (1 + Vj s j) = R - 
Proof: See Appendix IV for the proof 



2 Notice that all feasible points are regular. 
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A. Practical Implementation 

The optimum compression in Theorem 2 can be carried out using a practical Transform Coding (TC) 
approach. With TC, BSi first transforms its received vector using an invertible linear function and then 
separately compresses the resulting scalar streams [25]. We show that the conditional Karhunen-Loeve 
transform (CKLT) is an optimal linear transformation [26]. First, let recall that multiplying a vector 
by a matrix does not change the mutual information [27], i.e., I (^X s ;Yo,Y^j = I (^X s ; Yj, U^Y^j 
and / ^Yi; Yi |lo^ = I (Yi;U^Yi\Yo^ . From Theorem 2, the optimum compressed vector satisfies 
Y* = Y ± + Z*, with Z* ~ CM (0, Ur]' 1 ^) and R Yi \y = USU^. Therefore, the following compressed 
vectors are also optimal 

Yi = U^Y 1 + U^Z*, (16) 

where vector U^Y\ is referred to as the CKLT of vector Y\. Notice now that R^ Yq = Ruiy^Yq + 
Rjji z» = S + rj 1 is diagonal. Therefore, the elements of the compressed vector Y\ are conditionally 
uncorrected given Yo- Likewise, so are the elements of vector U^Y\. Due to this uncorrelation, each 
element j = 1, • • ■ ,N\ of vector U^Y\ can be compressed, without loss of optimality, independently of 
the compression of the others elements, at a compression rate rj = log (1 + rjjSj), j = 1, • ■ ■ ,Ni [16]. 
From Theorem 2 we validate that Ylf=i r j = R- This demonstrates that CKLT plus independent coding 
of streams is optimal, not only for minimizing distortion as shown in [26], but also for maximizing the 
achievable rate of coordinated networks. 

V. The Multiple-Base Stations Case 

Consider now BSo assisted by N > 1 cooperative BSs. The achievable rate follows (10) where, as 
previously, the objective function is not concave over <& n , n = 1, ■ • ■ , N. To make it concave, we change 
the variables: 3> ra = A" 1 , n = 1, ■ • ■ , N, so that: 

C = max log det (i + %h\ Q H sfi + Q V Jft (A n a 2 r + /) _1 A n H s A (17) 

s.t. logdet(J + diag(Ai,--- , A N ) R Yl:N \ Yo ) < R- 

Again, the feasible set does not define a convex set. Our strategy to solve the optimization is the following: 
first, we show that the duality gap for the problem is zero. Later, we propose an iterative algorithm that 
solves the dual problem, thus solving the primal too. An interesting property of the dual problem is that 
the coupling constraint in (17) is decoupled [23, Chapter 5]. 
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A. The dual problem 

Let the Lagrangian of (17) be defined on A n ^ 0, n — 1, ■ • ■ , N and A > as: 

C (A 1; ■ • • , A N , A) = log det (/ + % H\ Q H sfi + Q £ #U (^ r 2 + /) _1 A n H s ,n ) 

\ ° r n=l / 

-A • (log det (I + diag (A 1: ■ ■ ■ ,A n )R Yi:nIYo )-R) . (18) 
The dual function g (A) for A > follows [22, Section 5.1]: 

g (A) = max C (A 1 , ■ ■ ■ , A N , A) . (19) 
The solution of the dual problem is then obtained from 

C' = min 5 (A). (20) 

A>0 ^ ' 

Lemma 1: The duality gap for optimization (17) is zero, i.e., the primal problem (17) and the dual 
problem (20) have the same solution. 

Proof: The duality gap for problems of the form of (17), and satisfying the time-sharing property, 
is zero [28, Theorem 1]. Time-sharing property is defined as follows: let C x , C y , C z be the solution of (17) 
for backhaul rates R x ,R y ,R z , respectively. Consider R 2 = uR x + (1 — u) R y for some < v < 1. Then, 
the property is satisfied if and only if C z > uC x + (1 — v) C y , V v G [0, 1]. That is, if the solution of (17) 
is concave with respect to the backhaul rate R. It is well known that time-sharing of compressions cannot 
decrease the resulting distortion [27, Lemma 13.4.1], neither improve the mutual information obtained 
from the reconstructed vectors. Hence, the property holds for (17), and the duality gap is zero. ■ 

We then solve the dual problem in order to obtain the solution of the primal. First, consider maximiza- 
tion (19). As expected, the maximization can not be solved in closed form. However, as the feasible set 
(i.e., Ai, • • • , An h 0) is the cartesian product of convex sets, then a block coordinate ascent algorithm 3 
can be used to search for the maximum [23, Section 2.7]. The algorithm iteratively optimizes the function 
with respect to one A n while keeping the others fixed. It has been previously used to e.g., solve the 
sum-rate problem of MIMO multiple access channels with individual and sum-power constraint [30] [31]. 
We define it for our problem as: 

A l n +1 = arg max C • • • , A*+_\, A n , A l n+1 , ■ ■ ■ , A N , A) , (21) 

where t is the iteration index. As shown in Theorem 3, the maximization (21) is uniquely attained. 

3 Also known as Non-Linear Gauss-Seidel Algorithm [29, Section II-C]. 
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Theorem 3: Let the optimization A* = argmax^^o £> (Ai, ■ ■ ■ ,Ajy,\) and the conditional covari- 
ance matrix (See Appendix I-A) 



R 



Y n \ Yo ,Yc = H s,n \^I + Q \^Hl fi H 8 , + H tv (Ap^ 1 + 7 ) * A p H s , p j j QH\ n + ^7(22) 

with eigen-decomposition i? v ^ ^ c = U n SUl. The optimization is uniquely attained at A* = U n rfUt, 
where 



Proof: See Appendix V-A for the proof. 



l,---,N n . (23) 



Function £(Ai,--- , Ajv,A) is continuously differentiable, and the maximization (21) is uniquely 
attained. Hence, the limit point of the sequence { A\ , • • ■ , A l N } is proven to converge to a local maximum 
[23, Proposition 2.7.1]. To demonstrate convergence to the global maximum, it is necessary to show that 
the mapping T (A\, • • • , Ajy) = [A± + jVa^, ■■■ , Aat + ^Van^] is a block contraction 4 for some 7 
[32, Proposition 3.10]. Unfortunately, we were not able to demonstrate the contraction property on the 
Lagrangian, although simulation results suggest global convergence of our algorithm always. 

Once obtained g (A) through the Gauss-Seidel Algorithm 5 , it remains to minimize it on A > 0. First, 
recall that g (A) is a convex function, defined as the pointwise maximum of a family of affine functions 
[22]. Hence, to minimize it, we may use a subgradient approach as e.g., that proposed by Yu in [31]. 
The subgradient search consists on following search direction —h such that 

g(A 3~f A) > h VA'. (24) 
A — A 

Such a search is proven to converge to the global minimum for diminishing step-size rules [29, Section 
II-B]. Considering the definition of g (A), the following h satisfies (24): 

h = R- logdet (7 + diag (A 1 , ■■■ , A N ) R Yl . N \ Yo ) . (25) 

Therefore, it is used to search for the optimum A as: 

increase A if h < or decrease A if h > 0. (26) 

Consider now A = 1 as the initial value of the Lagrange multiplier. For such a multiplier, the optimum 
solution of (19) is {A\. • • • , A* N } = and the subgradient (25) is h = R (See Appendix V-B). Hence, 

4 See [32, Section 3.1.2] for the definition of block-contraction. 

5 Assume hereafter that the algorithm has converged to the global maximum of £ (A\, ■ ■ ■ , An, A). 
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following (26), the optimum value of A is strictly lower than one. Algorithm 1 takes all this into account 
in order to solve the dual problem, hence solving the primal too. As mentioned, we can only claim 
convergence of the algorithm to a local maximum. 

Algorithm 1 Multiple-BSs dual problem 
1: Initialize A m i n = and A max = 1 
2: repeat 

4: Obtain {A\, • • • , A* N } = argmax£ (Ai, ■ ■ ■ , Ajv, A) from Algorithm 2 

5: Evaluate h as in (25). 

6: if h < 0, then A m i n = A, else A max = A 

7: until A max - A min < £ 

8: {*t,.--,^}={(At)- 1 ,...,(A^)- 1 } 



Algorithm 2 Non-linear Gauss-Seidel to obtain g (A) 
1: Initialize A° = 0, n = 1, ■ • ■ , N and t = 
2: repeat 

3: for n = 1 to do 

4: Compute Ry n \ Yo7 Yc i A \ + \ At n-v A n+n A n) from (22). 
5: Take its eigen-decomposition U n SUl and compute 77 as in (23). 
6: Update A^ 1 = U n rjUl 
7: end for 

8: t = t+l 

9: until The sequence converges {A\, ■ ■ ■ , A^} — > {A^, • • • , A^} 
10: Return {A^, • • • , A* N } 



B. Practical Implementation 

In the network, Distributed Wyner-Ziv compression can be practically implemented using a simple 
Successive Wyner-Ziv (S-WZ) approach [20] [33, Theorem 3]. To describe it, let us recall that the 
optimum compression noises ,<&* N are obtained from Algorithm 1, and let ir (■) be a given 

permutation on {1, • • ■ , N}. For such a permutation, the S-WZ coding is defined as follows: 
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Parallel Compression: BS^(i) compresses its received vector using a single-source Wyner-Ziv code 
with decoder side information Yq (following Proposition 1), at a compression rate 

Pir(l) = 1 (*7r(l);*7r(l)l*o) 

= logdet(j+(*; (1) )" 1 J R^ (1) | Vo ). (27) 

The conditional covariance is calculated in (53). In parallel, BS 7r ( n ) n > 1, compresses its signal 
using a single-source Wyner-Ziv code with decoder side information (Yo,Y 7T ^ 1 . n _ 1 ^j , at a rate 

Pir(n) = I (^(n);^(n)|*b>^r(l:ri-l)) 



= iogdet(/+(*; (n) )~ 1 



^ncDiVo.na^-u 1 • (28) 



In this case, the conditional covariance can be calculated from (56). 
• Successive Decompression: BSo first recovers the codeword ^(i) using side information Yq; later, 

it successively recovers codewords Y n ^, n > 1, using lo> ^7r(i:n-i) as side information. 
It is easy to check the optimality of the S-WZ coding: 

N N 

Pir(n) = ^1 (^(n)! Y n(n)\ Y 0, *7r(l:n-l)) (29) 
n=l n=l 
N 

= ^ I (Yi-.N; ^r(n) \ Y 0, ^r(l:ra-l)) 
n=l 

= R. 

Second equality comes from the Markov chain in Proposition 2, and third from the chain rule for mutual 
information; The fourth follows from the fact that <&*, • • • , &* N satisfy the constraint (10) with equality. 
Unfortunately, transform coding is not (generally) optimum for S-WZ with N > 1, since the eigenvectors 
of 3>*( n ) = Un'q^Un, and those of R Y i ^ ^ i _ i = V n SVn does necessarily match. 

VI. The Multiple User Scenario 

In previous sections, we considered a single user within the network. To complement the analysis, we 
study hereafter multiple (i.e., two) senders transmitting simultaneously. The users, si and S2, transmit 
two independent messages u> u € {l,--- , 2 niJ "}, u = 1,2, mapped onto codewords X™, u = 1,2, 
respectively. Codewords are drawn i.i.d. from random vectors X u ~ CAT (0,Q U ), u = 1,2 and are not 
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subject to optimization. Hence, now, the BSs receive: 

2 

Y? = ^H Uti X2 + Z?, i = 0,--- ,N. 



(30) 



u=l 



Here, H U j is the MIMO channel between user s u and BS«, and Z^ ~ CAA (0, ofi"). As previously, 
signals at BSi, • • • , BSat are distributely compressed using a D-WZ code, and later sent to BSo, which 
centralizes decoding. Using standard arguments, the set C of transmission rates R u , u = 1,2 at which 
messages uj u , u = 1, 2 can be reliably decoded is [27] [14]: 



C = coh 



{R\,R 2 ) : 



Ri < I (Xi; Y , Yi : n\X 2 
R 2 < I [x 2 ; Yo, Yi : n\Xi 



\ 



(31) 



U 

y/(v- 1:N ;V 1:JV |V )<R 

The union in (31) is explained by the fact that compression codebooks might be arbitrary chosen at 



R 1 + R 2 <I(X 1 ,X 2 ;Y Q ,Y 1:N 



I 



the BSs. Notice that the boundary points of the region can be achieved using superposition coding (SC) 
at the users, successive interference cancellation (SIC) at the BSo, an d (optionally) time-sharing (TS). 
Furthermore, as for the single-user case, the optimum conditional distributions p (Yi\Y^j, i = 1, ■ • ■ ,N 
at the boundary of the region can be proven to be Gaussian 6 . Therefore, the union in (31) can be restricted 
to compressed vectors of the form Y = Yi + Zf, where Zf ~ CM (0, <&$). That is: 



C = coh 



U 



Ri < log det (l + Sfflj H h0 + Qi E,li Hi,„ + *») 1 Hi,.) ] \ 



ec(R) 



-1 



i?2 < log det (/ + QfH{ H 2fi + Q 2 E^Ll ^2 f ,n (^rJ" + *nP ^2,n 

k i?i + R2 < log det (/ + %H\ fi H afi + Q En=i H U {^.1 + * n ) _1 ff s , n ) J ^ 
Where c(R) = {*i :JV : log det (J + diag (^r 1 ^-- ^iv 1 ) *V 1:J v|Vo) < R }' Q = diag(Qi,Q 2 ) and 
i? s ,n = [-ffi,TD H 2:Tl ], for n = 0, • • • , iV. Covariance R Yl . N \y is calculated in Appendix I-B. To evaluate 
such a region, we resort to the weighted sum-rate (WSR) optimization [34, Sec. III-C]. That is, we express 

C = {{Ri,R 2 ) : aRi + (1 - a) R 2 < H(a) ,Va G [0,1]}, (33) 

with 1Z (a) the maximum WSR, given weights a and (1 — a) for user s\ and «2, respectively. Such 
a WSR is achieved with equality at the boundary of the region. Thus, it can be attained considering 
SIC at BSo, which consists of first decoding the user with lowest weight, considering second user as 
interference. Later, once decoded the first user, the decoder substracts its contribution to the received 
signal, and then decodes the second user without interference. 



(32) 



6 Recall that X u ~ CM (0, Q u ), u= 1,2. We omit the proof due to space limitations. 
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A. Useful Outer Regions 

Prior to solving the WSR optimization, we present two outer regions on (32). 
Outer Region 1: Rate region (32) is contained within the region 

R, < logdet (/ + f En=o Hl n H lyn ) 

R 2 < log det ( I + § En=0 H 2,n H 2,n) (34) 
i?l + i?2 < log det (l + g £^ =Q Hl n H s ,n 



Remark 3: It is the capacity region when Yi, i = 1, ■ • • , N are available at BSo- 
Outer Region 2: The sum-rate satisfies 



Ri + R2 < log det (j + ^H sfi QH\ ^j + R. (35) 



Proof: It is equivalent to the proof of upper bound 2. 



B. Sum Rate Maximization 



The sum-rate of (32) is identical to the maximum transmission rate of a single user s transmitting 
vector X s = [Xf^X^] , with eqi 
maximize it we resort to Algorithm 1. 



a vector X s = [Xf , Xj] , with equivalent channel H s>n = [iJi, n , H 2jTl ], n = 0, • ■ ■ , N. Hence, to 



C. Weighted Sum Rate Maximization 

Let consider the WSR optimization with a > ^ (i.e., higher priority to user 1, which is decoded last 
at the SIC). With such a decoding, the maximum rate of user 1 is 

R! = I (x^Yo^nIXz) (36) 
= logdet (i + %H[ H lfi + Q 1 H{ n {a 2 r I + * n ) _1 H lj7 \ . 

V °r n= i / 

On the other hand, the rate of user 2, which is decoded first, follows: 

R 2 = I (X 2 ;Y ,Y 1:N ) (37) 

= I (Xi, X 2 ; Y , Yi:Nj — I (Xi-,Yo,Yi : n\X2J 

= log det (i + %H\ iQ H afi + H ln + *n) ~* H»,n j ~ Rl, 

V ° r n=l / 

where Q = diag (Qi, Q 2 ) and H s . n = [H\. n , H 2>n \. The WSR, aR\ + (1 — a) R 2 , which has to be 
maximized is convex on <&i, •••,<& at- To make it concave, we use the change the variables = A" 1 , 
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n = 1, ■ • ■ , N. Then, plugging (36) and (37) into (33), the WSR optimization turns into 

K (a) = max a ■ R± + (1 - a) • R 2 (38) 

s.t. logdet(J + diag(Ai,-- - , A at) Ry 1:N \y ) <R 

As previously, the constraint does not define a convex feasible set. To solve the optimization, we follow 
the strategy presented previously: first, we show that the optimization has zero duality gap. Later, we 
propose an iterative algorithm that solves the dual problem, thus solving the primal too. 

Lemma 2: The duality gap for the WSR optimization (38) is zero. 

Proof: Applying the time-sharing property in [28, Theorem 1] the zero-duality gap is demonstrated. ■ 

Let then solve the dual problem. The Lagrangian for optimization (38) is defined as: 

C a (Ax, • • • , A n , A) = a ■ Ry + (1 - a) • R 2 - A ■ (logdet (I + diag (A 1: ■■■ ,A N ) R Yi .. n \y ) ~ R) (39) 
The first step is to find the dual function [23, Section 5] 

g a (X)= max C a (Ai, • • • , A n , A) (40) 

A u — ,A„>:0 

In previous sections, we showed that such an optimization can be tackled using a block-coordinate 
algorithm. Unfortunately, now, the maximization with respect to a single A n cannot be solved in closed- 
form, and is not clear to be uniquely attained. Hence, to solve (40), we propose another algorithm: the 
gradient projection method (GP) [23, Section 2.3]. GP has been used to e.g., compute transmit covariances 
for MIMO interference channels, and the WSR of MIMO broadcast channels [35, Section IV-C] [36]. It 
is defined as follows: let (40), and consider the initial point {A°, • • • , A°} y 0. It iteratively updates 
[23, Section 2.3.1]: 

A'+^Ai + ^iAi-Ai), n=l r --,N (41) 

where t is the iteration index and < 7f < 1 is the step size. Also, 

A t n =[A t n + S fV An £ a (X,A\,--- ,A%)} t0 , n = l,-..,N (42) 

with st > an scalar and V A n £a {X,A\,--- , A f N ) the gradient of C a (•) with respect to A n , evaluated 
at A\, ■ • • , A l N . Finally, [ ]^ denotes the projection (with respect to the Frobenius norm) onto the cone of 
positive semidefinite matrices. Whenever j t and st are chosen appropriately, the sequence { A\, • • • , A^} 
is proven to converge to a local maximum of (40) [23, Proposition 2.2.1]. (For global convergence to 
hold, the contraction property must be satisfied. Unfortunately, we were not able to prove this property 
for our optimization). In order to make the algorithm work for the problem, we need to: i) compute the 
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projection of a Hermitian matrix S, with eigen-decomposition S = Ur)U^, onto the cone of positive 
semidefinite matrices. It is equal to [37, Theorem 2.1]: 

[S] ^ = U diag (max {r?i , 0} , • • ■ , max {Vm,0})Ul (43) 

ii) Obtain the gradient of C a (•) with respect to a single A n , which is twice the conjugate of the partial 
derivative of the function with respect to such a matrix [24]: 



Va„£ s (A 1:N , A) = 2 



dC a (A 1:N ,X) 
dA n 



t 



(44) 



The Lagrangian is defined in (39). To obtain its partial derivative, we make use of (79): 



Slog det (7 + diag (A lt ■ ■ ■ , A N ) R Yi .. n \y ) 



dA r . 







logdet ^ 



I + A n R 



Y n \Y ,Y' 



dA r . 



(45) 



(46) 



- R Y n \Y ,Y° ( 7 + A n R Y n \Y Q ,Y^) ■ 

The conditional covariance is computed in Appendix I-B. Furthermore, we can also derive that 

dRl dl (x^Yi-.NlXi) 

dA n dA n 

81 (x i; Y n \X 2 ,Y ,Y^ 
dA n 

where second equality follows from the chain rule for mutual information and noting that I (Xi; Yo, Y^\X 2 ^j 
does not depend on A n . The mutual information above is evaluated as: 



I (x V ,Y n \X 2 ,Y ,Y^ = H (Y n \X 2 ,Y ,Y^j - H (Y n \X u X 2 ,Y ,Y^ (47) 

= log det (R Yn \x 2 ,Y ,Y n c + * n ) ~ log det ^ I + *") 

= log det (Ai-Ry n |x 2 ,Y- ,Vc + l) ~ lo S det (A-n&r + "0 

Last equality follows from $ n = A^ 1 , and R Y ^ Y(l y<= * s com P ut ed in Appendix I-B. Therefore, the 
derivative of R\ remains [24] 



dRi 
dAr,. 



R Y n \X 2 ,Yo,Xi V nR Y n \X2,Y ,Y° 



+ 1) 1 - a 2 r {A n a 2 r . + I) 1 



Equivalently, we can obtain for the derivative of R 2 that 

dR 2 dl (X 2 ;Y ,Y 1:N 



dA n dA n 

dl (x 2 -Y n \Y^Y-) 
dA n 



(48) 



(49) 
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Where we evaluate: 

I (x 2 ;Y n \Y ,Y*) = H (Y n \Y ,Y^ - H (Y n \X 2 ,Y ,Y^ (50) 
= log det (AnRy^yc + I) - log det (AnRy^ y^ + I) 
Conditional covariances are obtained in Appendix I-B. The derivative of R 2 thus remains: 



dR 2 



l T 



1 / \ -1 



^Y n \Y ,Y^ ( ■^ n ^Y n \Y ,Y° + I) Ry„\X 2 ,Yo,Yc \ J ^ n ^'Y n \X 2 ,Y ,Y- + "0 ■ (51) 



8A n 

Plugging (45), (48) and (51) into (44) we obtain the gradient of the function, which is used in the GP 
algorithm to obtain g a (A). Notice that for a < \, the roles of users s\ and s 2 are interchanged, being 
user 1 decoded first. This roles would also need to be interchanged in the computation of the gradients 
of R\ and R 2 . Once obtained the dual function, we minimize it to obtain: 

U (a) = mm g a (A) . (52) 

To solve this minimization, we use the subgradient approach as in Section V. Taking all this into account 
we build up Algorithm 3. As for the previous section, we can only claim local convergence. 

Algorithm 3 Two-user WSR dual problem 
1: Initialize A m j n = and A max 
2: repeat 

4: Obtain {Al, • • • , A* N } = argmax£ Q (Ai, • • • , A n , A) from Algorithm 4 
5: Evaluate h as in (25), where R Yl . n \y follows Appendix I-B. 
6: if h < 0, then A m i n = A, else A max = A 

7: until A max - A m i„ < £ 

8: U (q) = aRx {A\, ■ ■ ■ , A* N ) + (1 - a) R 2 (Al, ■ ■ ■ , A* N ). 



VII. Numerical Results 

We evaluate the performance of D-WZ coding within a single-frequency network composed of a central 
base station BSo plus its first tier of six cells. The radius of each cell is 700 m, and BSs have all three 
receive antennas. On the other hand, users have two antennas, are located at the edge of the central cell 
and transmit isotropically, i.e., Qi = ^^1. Transmitted power is set to 23 dBm, and wireless channels are 
simulated taking into account path loss, log-normal shadowing and Rayleigh fading. Specifically, fading 
is assumed i.i.d. among antennas, and shadowing uncorrelated among BSs. Two propagation scenarios 
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Algorithm 4 GP to obtain g a (A) 
1: Initialize A° = 0, n = 1, ■ • ■ , TV and i = 
2: repeat 

3: Compute the gradient G l n = V a„£« (A, A^, • ■ ■ , A* N ), n = 1, • ■ ■ , TV from (44). 
4: Choose appropriate s t 

5: Set A^ = A l n + Si • Gjj. Calculate A l n = U n r)U\. Then, A l n = I7 n max {77, 0} U\, n = 1, • • ■ , TV. 
6: Choose appropriate 74 

7: Update A\t l = Ai + lt ( A^ - A\) , n = 1, • • • , N 
8: t = t + 1 

9: until The sequence converges {A\, ■ ■ ■ , A^] — > {A\, ■ ■ ■ , A* N } 
10: Return {A*,-- - , A* N } 



are studied: /) Line-of-sight (LOS), with path-loss exponent a = 2.6 and shadowing standard deviation 
a = 4 dB. if) Non Line-of-sight (N-LOS), with a = 4.05 and a = 10 dB. 

Fig. 2 plots the cumulative density function (cdf) of the uplink rate 7 for a single-user network, 
considering different values of the backhaul rate R. Particularly, Fig. 2(a) depicts results for LOS 
propagation, and shows gains up to 6 Mbit/s @ 5% outage, with R = 15 Mbit/s. It is clearly shown that 
BSs cooperation becomes more remarkable for lower outage probabilities. On the other hand, Fig. 2(b) 
shows results for N-LOS propagation, where rate gains are reduced. In this case, cooperation becomes 
more convenient for higher outages, showing that @ 50% outage, three-fold gains arise with 15 Mbit/s 
of backhaul. 

Fig 3 plots the uplink rate of a single-user network with R = 7 Mbit/s, for different number N of 
cooperative BSs. First, Fig. 3(a) depicts the cdf of the user's rate under LOS propagation conditions. We 
notice that @ 5% outage, with only 1 cooperative BS, a rate gain of 2 Mbit/s is obtained with respect 
to the non-cooperative case. However, when increasing the number of cooperative BSs to 6, only an 
additional rate gain of 2 Mbit/s is obtained. That is, the impact of introducing new cooperative BSs in 
the system diminishes as the network grows. Again, cooperation is more useful for low outages. On 
the other hand, Fig. 3(b) depicts results for N-LOS propagation. It can be shown that, @ 50% outage, 
the rate is doubled from 1 cooperative BS to 6 cooperative BS. This fact highlights the relevant role of 

7 The user is assumed to transmit at 1 Mbaud, i.e., 1 Msymb/s. 



February 8, 2008 



DRAFT 



20 



macro-diversity on N-LOS conditions, which are most common ones on urban cellular networks. Next, 
Fig. 4 compares the rate performance of our D-WZ approach with respect to that of Quantization [10], 
assuming LOS propagation. We consider a simple network with two BSs: BSo and BSi, and plot its 
outage capacity with D-WZ and with uniform quantization, respectively. Both are normalized with respect 
to the outage capacity with infinite backhaul and computed at a probability of outage of 1CT 2 . Results 
show significant gains, of up to 12%, for low backhaul rates, and hihglights the fact that D-WZ requires 
half of backhaul rate than Quantization to converge to the oo backhaul capacity. 

Fig 5 depicts the expected sum-rate 8 of the multi-user setup versus the total number of users. Results 
are shown for different values of the backhaul rate. Although the sum-rate analysis (see Sec. VI-B) was 
carried out for two users only, the extension to U > 2 is straightforward. Fig 5(a) depicts the sum-rate 
for LOS propagation. We first notice that the sum rate with oo backhaul capacity (i.e., outer region 1) 
is far away from the sum-rate with D-WZ compression. This is explained by means of outer region 2: 
the sum-rate of the system is constrained by the available rate at the backhaul network. On the other 
hand, for N-LOS propagation (Fig. 5(b)), upper bound 2 is not reached. Indeed, for less than 5 users, 
the expected sum-rate with only R = 15 Mbit/s of backhaul is almost identical to that of R = oo. 
Therefore, for practical number of transmitters, the full rate gain due to macro-diversity is obtained via 
D-WZ compression. Finally, Fig. 6(a) and Fig. 6(b) depict the rate region of a 2-user network, with and 
without LOS respectively, for different values of the Backhaul rate R. It is clearly shown that the region 
is significantly enlarged with only 5 Mbit/s of backhaul rate. 

VIII. Conclusions 

We studied distributed compression for the uplink of a coordinated cellular network with N +1 multi- 
antenna BSs. Considering a constrained backhaul of limited capacity R, base stations BSi,--- , BSat 
distributely compress their received signal using a Distributed Wyner-Ziv code. The compressed vectors 
are sent to BSo, which centralizes user's decoding. Considering single and multiple users within the 
network, respectively, the D-WZ scheme has been optimized using the users' rate as the performance 
metric. 

Appendix I 
Conditional Covariances 

We derive here conditional covariances used throughout the paper. (See supporting material) 
8 The expected sum-rate is obtained by averaging the sum-rate of the system over the user's channels. 
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A. The single user case 



R YnlYo = H s n ( / + %H\ Q H a , Q ) QH\ n + a 2 r I, n = 1, • • • , N. 



R 



Y 1:N \Y 



H 



8,1 



H 



s,N 



J + § fl I,o».,o) Q 



If 



+ <7?I. 



(53) 



(54) 



(55) 



R 



Y n \Y ,Y g 



(56) 



S. 77ie multiuser case 

Define H 8jn = [ifi iTt , -ff2,n] an d Q = diag (Qi, ^2)- Then, Conditional covariances R Yn \Y , R-Yi. N \Y 

Y n \Y ,Yg 



Ry , Y Yc and Ry \y Yr follow Subsection I-A. Furthermore, let i,j £ {1, 2} with j / i, then 



-1 



R 



Appendix II 
Proof of Proposition 1 

Let the chain rule for mutual information: 

/ [X s - Y , Y 1:JV ) = / (X,; Y ) + / (X a ; Y 1:N \Y ) . (58) 
Also, let expand the constraint to obtain: 

I (Y 1:N ; Y 1:N \Y ) = H (y 1:N \Yo) - H (y 1:N \Y , Y 1:N ) 

= I (X s ; Y 1:N \Yo) + H (y 1:N \Y , X s ) - H (y 1:N \Y , Y 1:N ) . (59) 
Given the Markov chain in Theorem 2: H (Yi : n\Yq,Yi : n^ = H (Yi : n\Yq, which plugged 



into (59): 



I {y 1:N ;Y 1:N \Yo^ =i(x 8] Y 1:N \Y }+i(y 1:N] Y 1:N \Y ,X 8 } . 



(60) 
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Let now V be the feasible set of conditional probabilities n^Li^^^l^J' *- e -> tne set f° r which 
I (Yi : n;Yi : n\Yo^ < R. Hence, making use of (60), the feasible set satisfies: 

I (x s ;Y 1:N \Y ) <R-l(Y 1:N ;Y 1:N \Y ,X s y (61) 

Introducing (61) into (58), we derive that for the feasible set: 

/ (X 8 ; Yq, Y 1:N ) < I (X s ; Y ) + R - I (y 1:N ; Y 1:N \Y , X s ) . (62) 

Now, notice that / (y 1:N ; Y 1:N \Y , X s ^j = I ^Z 1:N ; Y 1:N \Y , X s ^j where Z { is the AWGN at the BS;. 
This mutual information is minimized in V for p (Yi : n^ Gaussian. Therefore, I (x s ; Yo, YLat) in (62) is 
maximum in V for Gaussian distributed vectors Yi : jv, specifically those satisfying / (Yi : n; ^i:JV l^o) = R 
(i.e., those for which equality holds in (62) and (61)). As mentioned, the received vectors Y, are also 
Gaussian. Therefore, at the optimum, Y and Y, L are jointly Gaussian, so we can write Y L = M.Y; L + Zf 
with M a constant matrix and Zf an independent Gaussian vector. However, as the multiplication by a 
matrix does not affect mutual information, we can state that vectors Y = Yi + Zf are also optimal, with 
Z l c ~ CM (0, Using this relationship, we evaluate 

/ (X s ; Y , Y 1:N ) = logdet (i + %h{ q H s $ + Q £ fZj n {a 2 r I + * n ) _1 H S:7 \ (63) 

V °~ r n=l / 

Furthermore, we can also obtain: 

/ (Y 1:N ; Y 1:N \Yo) = H (y 1:N \Yo) - H (y 1:N \Y 1:N , Y ) (64) 

= logdet (/ + diag(*rV-- ,*^) R Yl:NlYo ) . 

Appendix III 
Proof of Upper Bound 2 

To prove the statement, we first rewrite the objective and constraint of (9) as (58) and (60), respectively. 
At the optimum point of maximization (9), the constraint is satisfied. Therefore, / (y[ : n; Y"i : jv|1o^ < R, 
which plugged into (60) obtains 

/ (X s ; Y 1:N \Yo) <R-I (Y 1:N ; Y 1:N \Y , X s ) , (65) 

which in turn introduced into (58) allows to bound 

/ (x a ; Y , *i :JV ) < / {X s ; Y ) + R-I (y 1:N ; Y 1:N \Y , X s ) (66) 

Since / (y 1:N ; Yi :N \Y ,X s ^j > by definition, we can state that / (^X s ;Y ,Y 1:N ^j <I(X S ;Y )+R. 
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Appendix IV 
Proof of Proposition 2 

In this Appendix, we solve the non-convex optimization (13). Let us first expand: 
logdet (i + ^Hl H sfi + QHl A (A 1( t 2 + I)' 1 i4iJT M ) 

= logdet (i + Qh1 H s ^ + logdet (i + (A,a 2 + i)' 1 A 1 {R Yi \y - a 2 r l)) 

= logdet (i + Qh1 H s ^ + logdet (/ + A^y^) - logdet (I + A l0 *) . 

First equality follows from the value of Ry^Yq m (53)- Notice that logdet + ^h\ q H s ^ 



(67) 
does not 



depend on Therefore, the Lagrangian for the problem can be written as 

£(Ai,A,#) = (1 - A) logdet (J + A-^Ry^) - logdet (J + A l0 *) + AR-tr{*Ai}, 

where A is the Lagrange multiplier for the explicit constraint and $ ^ for the semidefmite positiveness 
constraint. The derivative of the Lagrangian with respect to A\ thus reads [24]: 



dC 

dAi 



(1 - A) R Yi \y q {I + Aifl Vl | Vo ) 1 - a 2 . (I + A,a 2 ) 



2\-l 



(68) 



Accordingly, the KKT conditions for the problem, which are necessary but not sufficient, are: 



dC 

dA 1 



l T 



(69) 



ii) A (log det (/ + A^y^Yo) - R) = 
Hi) tr{*Ai} = 0. 

Let now the eigen-decomposition Ry^Yi = USU^ . Then, it can be readily shown that matrix A\ 
Udmg (rji, ■■ ■ ,7] Nl )UK with 



1 ( 1 



(70) 



satisfies the KKT conditions, with multiplier A* such that J2f=i 1°S (1 + Vj s j) = R (therefore, A* < 1), 
and multiplier 3>* < computed from (68). Let now show that A\ satisfies also the general sufficiency 
condition for optimality, which is presented in the next Lemma. 

Lemma 3: [23, Proposition 3.3.4] Let the differentiable maximization (13). Consider a pair {A\, A*) 
for which A* (logdet (/ + A\R Yl \y^) - R) = 0. Then, A\ is the global maximum of (13) if: 



A1 € arg max C (Ai, A*) , 
1 Atto 



(71) 
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where the Lagrangian 9 has been defined in (68). 

Lemma 4: Let A, B y 0, with ordered eigenvalues Ta,Tb respectively. Then, 

logdet (I + AB) < logdet (/ + T A T B ) , (72) 

with equality whenever A and B have conjugate transpose eigenvectors. 

Proof: It is known that log det (J + Ai?) = log det (I + Tab), where Tab are the ordered 
eigenvalues of AB. Those eigenvalues are logarithmically majorized [38, Definition 1.4] by the product 
of the separate eigenvalues of A and B, i.e., Tab -<x TaTb [39, Theorem 9.H.l.d]. Let now the function 
/ (X) = logdet (I + X) be defined on the set of semi-definite positive diagonal matrices, i.e., f (X) = 
^log(l + Xi). We may apply [38, Theorem 1.6] to prove that / (X) is a Schur-geometrically-convex 
function. Accordingly, provided that Tab TaI^, then log det (I + Tab) < logdet (I + TaTb), 
which concludes the proof. ■ 
Let us prove now that our pair (A*, A*) satisfies (71). The lagrangian is defined for the problem as 

£(Ai,A*) = (1 - A*) logdet (/ + AiR Yi \y ) ~ logdet (J + A ia *) + A*R. (73) 

Recall that A* < 1 and R Yi \y = USUI Then, using Lemma 4 we can bound: 

max£(Ai,A*) < max (1 - A*) log det (I + rjS) - log det (/ + rjaj) + A*R 

= A*R + ^ max (1 - A*) log (1 + r/j-sj) - log (1 + r/j^) (74) 

where r] is the diagonal matrix of ordered eigenvalues of A\. The individual maximizations on rjj in (74) 
are not concave. However, the continuously differentiable functions fj (rjj) = (1 — A*) log (1 + rjjSj) — 
log (l + r\jo1) have only two stationary points, i.e.,: 



dfj 
drjj 



rtj = oo 

i i\ i (75) 
. = JF ~ T 3 ) ~ 

Recalling that < A* < 1, it is easy to show that lim^^oo fj (rjj) = — oo. Therefore rjj = oo is the 
global minimum of the problem. Considering the other stationary point, it can be shown that its second 
derivative is lower than zero. Accordingly, it is a local maximum, unique because there is no other. 
However, we restricted the optimization to the values rjj > 0. Hence, functions fj (rjj) take maximum at: 

1/1 1 \ 1 



r ?7 



A* V Cr s j J a r J 



+ 

(76) 



'Notice that the semi-definite multiplier * has been removed of the Lagrangian by constraining the maximization (71) to the 
set A x y 0. 
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Plugging this optimal values into (74), we bound 

TVi N 

max C A*) < A*R + (1 - A*) £ log (l + V * jSj ) - ]T log (l + V *a 2 r ) (77) 

j=i i=i 

Furthermore, noticing that for A\ = UrfU^: 

N\ N 

C [A\, A*) = A*R + (1 - A*) J] log (1 + V * Sj ) - log (l + V*^r) , (78) 

j=i i=i 

then, it is demonstrated that A\ = argmaxA^o E> -V*). Hence, the general sufficient condition holds, 
and it is optimum. Finally, <&* = (A*) -1 , which concludes the proof. 

Appendix V 

A. Proof of Proposition 3 

In this Appendix, we solve the non-convex optimization A* = argmax^^-o £ (Ai, • • • , Ajv, A). First, 
recall that logdet (/ + diag (Ai, • • • , An) R Yi . n \y ) is equal to I (y~i : n; Y"i : jv|^o) ( as shown in (64), 
changing <I> n = A^ 1 V n). Then: 

logdet (J + diag (A u - ■■ ,A n )R Yi . n]Yo ) = I (y 1:N ; Y 1:N \Y ) (79) 

= / (Y 1:N ; Y£\Y ) + / (Y 1:N ; Y n \Y , Y n c ) 
= / (V n c ; Y:\Yo) + / (V n ; Y n \ Y , F n j 
= logdet (I + diag (Ai, • • • , A n _i, A n+i , • • • , Ajv) iV n <=|y ) 
+ log det + A n R Y ^ YQ ^ 

where second equality follows from the chain rule for mutual information, and the third from the Markov 
chain in Proposition 2. Finally, the fourth equality evaluates the mutual information as in (64), with 
3> ra = A" 1 . The conditional covariances are computed in Appendix I. Later, using (55) and equivalently 
to (67): 

log det (i + %H{ Q H sfi + Q^2 H{ n {A n a 2 + I) 1 A n H s , n ) 

V a r n=l / 



= log det I + 3^H,,o + QY1 H Ij ( A ^r + I ) ' A i H °,i 



Q rrt tt " + ' " ~ 2 ' ^ _1 

+ log det (i + A n R Ynl y.c ^ - logdet (I + A n a 2 r ) . (80) 
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Therefore, plugging (79) and (80) into (18), we can expand the function under study as: 
C(A U --- ,A N ,\) =logdet ll+^H^H^ + Qj^H^ {A j( r 2 r + 7)" 1 AjH 8tj ) (81) 

+ logdet (l + A n R Yn] ^ c Y ^ - logdet (/ + A n cr 2 ) 
-A (logdet (J + diag (A u ■■■ , A n _i, A n+1 , ■■■ , A N ) R Y c\y ) + logdet (i + A n R Y ^ x ^) ~ R) 
In order to obtain A* = argmaxA„^o £ (A\, ■ ■ ■ , A at, A), we first notice that the following Lagrangian 



C (An, A) = (1 - A) logdet (i + A„# K|y , - logdet (J + A n cr 2 ) + AR 



(82) 



satisfies argmax^^o £ (A n , A) = argmaxA„^o £ (A\, ■ ■ ■ , An, A), and it is identical to the Lagrangian 
in (73). Therefore, we can directly apply derivation (73)-(78) to solve it: 

Consider first A > 1. For it, (1 - A) log det (i + A n R Y ^ Ya Yc ^j -log det (J + A n cr 2 ) < 0, VA n y 0. 
Therefore, it is readily shown that: 



= arg max C (Ai, • • • , Ajy, A) for A > 1. 



Let now A < 1. Applying (73)-(78) we show that 



U n rjUl = arg max £ (Ai, • • • , An, A) for A < 1, 



with R Yn \Y„ Y° = U n SUl, and 



- - 

A U 2 



at 



This concludes the proof. 



(83) 



(84) 



(85) 



B. Solution of (19) with A > 1 

Applying equivalent arguments to those in (67), we can rewrite the Lagrangian in (19) as: 

C(A U --- ,A N ,X) = (1- A) log det (J + diag (Ai,--- , A N ) R Yi .. n \y ) 

-logdet (l + diag(Ai,-- - ,Ajv)<7 2 ) - AR, 
It is clear that, for A > 1, the Lagrangian takes its optimal value at {AJ, • • • , A* N } = 0. 
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Fig. 1. Multiple-source compression with side information at the decoder. 



JOINT 
DECODER 



BS 




12 14 16 18 20 22 24 26 28 0.5 1 1.5 2 2.5 

User Rate [Mbit/s] User Rate [Mbit/s] 



(a) CDF versus R, LOS (b) CDF versus R, N-LOS 

Fig. 2. Single user capacity results with respect to the backhaul rate. BSi, • ■ • , BS6 cooperate with BSo. 
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Fig. 4. Outage Capacity with D-WZ and with Quantization, respectively, for different values of the backhaul rate R. LOS. 
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